12 research outputs found

    Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

    Full text link
    This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP tasks: text classification and summarization. Our work provides crucial insights into the gaps between original and anonymized data (focusing on the pseudonymization technique) and model quality and fosters future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation. We make our code, pseudonymized datasets, and downstream models publicly availableComment: 10 pages. Accepted for TrustNLP workshop at ACL202

    Negative Barnett effect, negative moment of inertia of (quark-)gluon plasma and thermal evaporation of chromomagnetic condensate

    Full text link
    We discuss the negativity of the moment of inertia of (quark-)gluon plasma in a window of ``supervortical'' range of temperatures above the deconfining phase transition, T(11.5)TcT \simeq (1\dots 1.5) T_c found recently in numerical Monte Carlo simulations by two independent methods. In our work, we confirm numerically that the origin of this effect is rooted in the thermal evaporation of the non-perturbative chromomagnetic condensate. We argue that the negative moment of inertia of gluon plasma indicates the presence of a novel effect, the negative spin-vortical coupling for gluons resulting in a negative gluonic Barnett effect: the spin polarization of gluons exceeds the total angular momentum of rotating plasma thus forcing the orbital angular momentum to take negative values in the supervortical range of temperatures.Comment: 9 pages, 3 figure

    Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings

    No full text
    International audienceDetection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement)

    Generalizability of readability models for medical terms

    No full text
    International audienceDetection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. We propose to combine supervised machine learning algorithms using various features with word embeddings which contain context information of words. Data in French are manually cross-annotated by seven annotators. On the basis of these data, we propose cross-validation scenarios in order to test the generalization ability of models to detect the difficulty of medical words. On data provided by seven annotators, we show that the models are generalizable from one annotator to another

    RNN embeddings for identifying difficult to understand medical words

    No full text
    International audiencePatients and their families often require a better understanding of medical information provided by doctors. We currently address this issue by improving the identification of difficult to understand medical words. We introduce novel embeddings received from RNN - FrnnMUTE (French RNN Medical Understandability Text Embeddings) which allow to reach up to 87.0 F1 score in identification of difficult words. We also note that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the model which classifies words ac- cording to their difficulty. We study the generalizability of different models through three cross-validation scenarios which allow testing classifiers in real-world conditions: understanding of medical words by new users, and classification of new unseen words by the automatic models. The RNN - FrnnMUTE embeddings and the categorization code are being made available for the research

    Negative moment of inertia and rotational instability of gluon plasma

    No full text
    8 pages, 2 figuresUsing first-principle numerical simulations of the lattice SU(3) gauge theory, we calculate the isothermal moment of inertia of the rigidly rotating gluon plasma. We find that the moment of inertia unexpectedly takes a negative value below the "supervortical temperature" Ts=1.50(10)TcT_s = 1.50(10) T_c, vanishes at T=TsT = T_s, and becomes a positive quantity at higher temperatures. The negative moment of inertia indicates a thermodynamic instability of rigid rotation. We derive the condition of thermodynamic stability of the vortical plasma and show how it relates to the scale anomaly and the magnetic gluon condensate. The rotational instability of gluon plasma shares a striking similarity with the rotational instabilities of spinning Kerr and Myers-Perry black holes
    corecore